Speech emotion recognition (SER) is an important research topic. Image features like spectrograms are one of the common ways of extracting information from speech. In the area of image recognition, a relatively novel type of network called capsule networks has shown good and promising results. 1is study aims to use capsule networks to encode spatial information from spectrograms and analyse its performance when paired with different loss functions. Experiments comparing the capsule network with models from previous works show that the capsule network performs better than them.
Loading....